RUBY-3616 Fix load balanced implementation#3013
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Ruby driver’s load-balanced (LB) connection pinning/unpinning behavior so previously skipped LB unified spec tests can run and pass.
Changes:
- Enable previously skipped LB unified spec tests and add a new session-end pin release test.
- Implement multi-reason connection pinning (cursor vs transaction) and update session/cursor/pool logic to correctly retain or release LB connections.
- Extend the unified test runner to support additional spec assertions and options (e.g.,
batchSize,session, CMAPreason).
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/spec_tests/data/load_balancers/wait-queue-timeouts.yml | Unskips LB wait-queue timeout tests. |
| spec/spec_tests/data/load_balancers/transactions.yml | Unskips LB transaction pinning tests; adds “release on endSession” coverage. |
| spec/spec_tests/data/load_balancers/sdam-error-handling.yml | Unskips LB SDAM error handling tests; adjusts failpoint/expectation fields. |
| spec/spec_tests/data/load_balancers/cursors.yml | Unskips LB cursor pinning tests. |
| spec/runners/unified/ddl_operations.rb | Adds batchSize support for listCollections/listIndexes unified operations. |
| spec/runners/unified/crud_operations.rb | Adds session support for createFindCursor unified operation. |
| spec/runners/unified/assertions.rb | Adds CMAP event reason matching in unified assertions. |
| lib/mongo/session.rb | Ensures pinned LB transaction connections are released on end_session; stores connection object when pinning. |
| lib/mongo/server/connection.rb | Replaces boolean pinning with reason-based pin tracking (Set). |
| lib/mongo/server/connection_pool.rb | Adds pinned-connection reuse and avoids auto check-in for pinned connections. |
| lib/mongo/server_selector/base.rb | Releases pinned LB transaction connection when session is no longer in a transaction. |
| lib/mongo/operation/shared/executable.rb | Pins LB transaction connections with reason :transaction and avoids pinning during commit/abort. |
| lib/mongo/database/view.rb | Threads batch_size into listCollections execution path. |
| lib/mongo/cursor.rb | Skips killCursors after getMore network errors; unpins/checks-in LB connections more carefully. |
| lib/mongo/collection/view/iterable.rb | Reuses an already checked-out pinned connection for LB cursor initial queries when applicable. |
| lib/mongo/collection/view/change_stream.rb | Changes LB change stream initial query connection handling (manual checkout). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if server.load_balancer? | ||
| # In load balanced topology, manually check out a connection | ||
| # so it remains checked out and pinned to the cursor. | ||
| connection = server.pool.check_out(context: context) | ||
| result = send_initial_query(connection, context) | ||
|
|
||
| start_at_operation_time = if doc = result.replies.first && result.replies.first.documents.first | ||
| doc['operationTime'] |
There was a problem hiding this comment.
In load-balanced mode this code checks out a connection manually but does not guarantee it is checked back into the pool if send_initial_query (or later logic) raises. That can leak checked-out connections and eventually exhaust the pool. Consider using server.with_connection again now that ConnectionPool#with_connection skips check-in for pinned connections, or add an ensure that checks the connection in (when it is not pinned/owned by the created cursor).
| with_overload_retry(context: possibly_refreshed_context) do | ||
| process(execute_operation(get_more_operation)) | ||
| end | ||
| rescue Error::SocketError, Error::SocketTimeoutError |
There was a problem hiding this comment.
@get_more_network_error is only set for SocketError and SocketTimeoutError, but Error::ConnectionPerished also represents a network error (and is raised when a perished connection is reused). If get_more raises ConnectionPerished, close will still attempt to send killCursors, which the load balancer cursor spec says should not be attempted after a network error during getMore. Consider including Error::ConnectionPerished in this rescue that sets @get_more_network_error.
| rescue Error::SocketError, Error::SocketTimeoutError | |
| rescue Error::SocketError, Error::SocketTimeoutError, Error::ConnectionPerished |
No description provided.